SOAPindel: Efficient identification of indels from short paired reads

  1. Jun Wang1,5,6,8
  1. 1BGI Shenzhen, Shenzhen 518000, China;
  2. 2Bioinformatics Research Centre, Aarhus University, DK 8000 Aarhus C, Denmark;
  3. 3Broad Institute, Cambridge, Massachusetts 02142, USA;
  4. 4Human Genetics, Aarhus University, DK 8000 Aarhus C, Denmark;
  5. 5The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, 2200 Copenhagen, Denmark;
  6. 6Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
    • 7 Present address: Biodynamic Optical Imaging Center, and College of Life Sciences, Peking University, Beijing 100871, China.

    Abstract

    We present a new approach to indel calling that explicitly exploits that indel differences between a reference and a sequenced sample make the mapping of reads less efficient. We assign all unmapped reads with a mapped partner to their expected genomic positions and then perform extensive de novo assembly on the regions with many unmapped reads to resolve homozygous, heterozygous, and complex indels by exhaustive traversal of the de Bruijn graph. The method is implemented in the software SOAPindel and provides a list of candidate indels with quality scores. We compare SOAPindel to Dindel, Pindel, and GATK on simulated data and find similar or better performance for short indels (<10 bp) and higher sensitivity and specificity for long indels. A validation experiment suggests that SOAPindel has a false-positive rate of ∼10% for long indels (>5 bp), while still providing many more candidate indels than other approaches.

    Footnotes

    • 8 Corresponding authors

      E-mail mheide{at}birc.au.dk

      E-mail wangj{at}genomics.cn

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.132480.111.

    • Received September 29, 2011.
    • Accepted September 10, 2012.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

    | Table of Contents

    Preprint Server